Streaming sync serialization#287
Open
bjester wants to merge 10 commits intolearningequality:release-v0.9.xfrom
Open
Streaming sync serialization#287bjester wants to merge 10 commits intolearningequality:release-v0.9.xfrom
bjester wants to merge 10 commits intolearningequality:release-v0.9.xfrom
Conversation
f7d7b60 to
7762fff
Compare
3bc3ec8 to
8898125
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
streamz, which I unfortunately opted against because it uses tornado_serialize_into_storelogic into individual classes built upon foundational stream utilities-- so much better for unit testing!typing-extensionsfor backported future typing featuresMorangoProfileControllerto usesync_filterkwarg instead offilter-- always bothered me it shadowed the built-in_serialize_into_storewith newserialize_into_storestreaming replacementbulk_updateas Django was observed to spend excessive time with itImprovements
The changes were evaluated by installing the local version into Kolibri. A dedicated command was created within Kolibri to run solely the serialization step, and then the performance of that command was benchmarked.
Further investigation will be required to determine how to reduce the increased duration.
Case 1: existing large dataset
Kolibri was launched with a pre-existing database containing data for about 18,000 users.
Case 2: artificial 500 users
Kolibri's
generateuserdatacommand was used to generate data for 500 users, which is the maximum the command currently supports.Case 3: large dataset reduced -- 1000 users
Since the
generateuserdatacommand currently can only generate up to 500 users, the existing large dataset was trimmed down to 1000 users. After manually deleting the other users,kolibri managewas executed (no-op) to trigger Kolibri's FK integrity check which deletes the broken records. Note, this probably takes longer due to the deletions, which provides additional insights into the process, even though the deletion processing has not really changed.Case 4: large dataset reduced -- 5000 users
Again, the existing large dataset was trimmed down, this time to 5000 users. Same situation with regards to deletion behavior as in Case (3)
How AI was used
TODO
Reviewer guidance
Issues addressed
Closes #192